Search CORE

652 research outputs found

Linear time algorithms for finding and representing all the tandem repeats in a string

Author: Gusfield Dan
Stoye Jens
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

Gusfield D, Stoye J. Linear time algorithms for finding and representing all the tandem repeats in a string. Journal of computer and system sciences. 2004;69(4):525-546.A tandem repeat (or square) is a string [alpha][alpha], where [alpha] is a non-empty string. We present an O(|S|)-time algorithm that operates on the suffix tree T(S) for a string S, finding and marking the endpoint in T(S) of every tandem repeat that occurs in S. This decorated suffix tree implicitly represents all occurrences of tandem repeats in S, and can be used to efficiently solve many questions concerning tandem repeats and tandem arrays in S. This improves and generalizes several prior efforts to efficiently capture large subsets of tandem repeats

Elsevier - Publisher Connector

Publications at Bielefeld University

Balanced Vertices in Trees and a Simpler Algorithm to Compute the Genomic Distance

Author: Bergeron
Gyárfás
Hannenhalli
Jens Stoye
Lajos Soukup
Péter L. Erdős
Publication venue: 'Elsevier BV'
Publication date: 15/04/2010
Field of study

This paper provides a short and transparent solution for the covering cost of white-grey trees which play a crucial role in the algorithm of Bergeron {\it et al.}\ to compute the rearrangement distance between two multichromosomal genomes in linear time ({\it Theor. Comput. Sci.}, 410:5300-5316, 2009). In the process it introduces a new {\em center} notion for trees, which seems to be interesting on its own.Comment: 6 pages, submitte

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

Repository of the Academy's Library

Taxonomic classification of metagenomic shotgun sequences with CARMA3

Author: Gerlach Wolfgang
Stoye Jens
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

The vast majority of microbes are unculturable and thus cannot be sequenced by means of traditional methods. High-throughput sequencing techniques like 454 or Solexa-Illumina make it possible to explore those microbes by studying whole natural microbial communities and analysing their biological diversity as well as the underlying metabolic pathways. Over the past few years, different methods have been developed for the taxonomic and functional characterization of metagenomic shotgun sequences. However, the taxonomic classification of metagenomic sequences from novel species without close homologue in the biological sequence databases poses a challenge due to the high number of wrong taxonomic predictions on lower taxonomic ranks. Here we present CARMA3, a new method for the taxonomic classification of assembled and unassembled metagenomic sequences that has been adapted to work with both BLAST and HMMER3 homology searches. We show that our method makes fewer wrong taxonomic predictions (at the same sensitivity) than other BLAST-based methods. CARMA3 is freely accessible via the web application WebCARMA from http://webcarma.cebitec.uni-bielefeld.de

PubMed Central

Publications at Bielefeld University

Large scale hierarchical clustering of protein sequences

Author: Krause Antje
Stoye Jens
Vingron Martin
Publication venue: 'American Fisheries Society'
Publication date: 05/07/2007
Field of study

Background: Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and clearly a group of sequences that a given query sequence belongs to. Results: We report on our developments in grouping all known protein sequences hierarchically into superfamily and family clusters. Our graph-based algorithms take into account the topology of the sequence space induced by the data itself to construct a biologically meaningful partitioning. We have applied our clustering procedures to a non-redundant set of about 1,000,000 sequences resulting in a hierarchical clustering which is being made available for querying and browsing at http://systers.molgen.mpg.de/. Conclusions: Comparisons with other widely used clustering methods on various data sets show the abilities and strengths of our clustering methods in producing a biologically meaningful grouping of protein sequences

Large scale hierarchical clustering of protein sequences

Author: Krause Antje
Stoye Jens
Vingron Martin
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and clearly a group of sequences that a given query sequence belongs to. RESULTS: We report on our developments in grouping all known protein sequences hierarchically into superfamily and family clusters. Our graph-based algorithms take into account the topology of the sequence space induced by the data itself to construct a biologically meaningful partitioning. We have applied our clustering procedures to a non-redundant set of about 1,000,000 sequences resulting in a hierarchical clustering which is being made available for querying and browsing at . CONCLUSIONS: Comparisons with other widely used clustering methods on various data sets show the abilities and strengths of our clustering methods in producing a biologically meaningful grouping of protein sequences

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Publications at Bielefeld University

MPG.PuRe

Efficient implementation of lazy suffix trees

Author: Giegerich Robert
Kurtz Stefan
Stoye Jens
Publication venue: 'Wiley'
Publication date: 01/01/2003
Field of study

Giegerich R, Kurtz S, Stoye J. Efficient implementation of lazy suffix trees. SOFTWARE-PRACTICE & EXPERIENCE. 2003;33(11):1035-1049.We present an efficient implementation of a write-only top-down construction for suffix trees. Our implementation is based on a new, space-efficient representation of suffix trees that requires only 12 bytes per input character in the worst case, and 8.5 bytes per input character on average for a collection of files of different type. We show how to efficiently implement the lazy evaluation of suffix trees such that a subtree is evaluated only when it is traversed for the first time. Our experiments show that for the problem of searching many exact patterns in a fixed input string, the lazy top-down construction is often faster and more space efficient than other methods. Copyright (C) 2003 John Wiley Sons, Ltd

Publications at Bielefeld University

Online Abelian Pattern Matching

Author: Ejaz Tahir
Rahmann Sven
Stoye Jens
Publication venue: Technische Fakultät der Universität Bielefeld
Publication date: 01/01/2008
Field of study

Ejaz T, Rahmann S, Stoye J. Online Abelian Pattern Matching. Forschungsberichte der Technischen Fakultät, Abteilung Informationstechnik / Universität Bielefeld. Bielefeld: Technische Fakultät der Universität Bielefeld; 2008.An abelian pattern describes the set of strings that comprise of the same combination of characters. Given an abelian pattern P and a text T [Epsilon] [Sigma]^n, the task is to find all occurrences of P in T, i.e. all substrings S = T_i...T_j such that the frequency of each character in S matches the specified frequency of that character in P. In this report we present simple online algorithms for abelian pattern matching, and give a lower bound for online algorithms which is [Omega](n)

Publications at Bielefeld University

A Linear Time Algorithm for an Extended Version of the Breakpoint Double Distance

Author: Brockmann Leonie R.
Klerx Katharina
Stoye Jens
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)
Publication date: 01/01/2022
Field of study

Dagstuhl Research Online Publication Server

Fast and Simple Jumbled Indexing for Binary Run-Length Encoded Strings

Author: Dantas Simone
Gagie Travis
Kowada Luis
Stoye Jens
Wittler Roland
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)
Publication date: 01/01/2017
Field of study

Important papers have appeared recently on the problem of indexing binary strings for jumbled pattern matching, and further lowering the time bounds in terms of the input size would now be a breakthrough with broad implications. We can still make progress on the problem, however, by considering other natural parameters. Badkobeh et al. (IPL, 2013) and Amir et al. (TCS, 2016) gave algorithms that index a binary string in O(n + r^2 log r) time, where n is the length and r is the number of runs, and Giaquinta and Grabowski (IPL, 2013) gave one that runs in O(n + r^2) time. In this paper we propose a new and very simple algorithm that also runs in O(n + r^2) time and can be extended either so that the index returns the position of a match (if there is one), or so that the algorithm uses only O(n) bits of space instead of O(n) words

Dagstuhl Research Online Publication Server